A Supervised Method of Feature Weighting for Measuring Semantic Relatedness
نویسندگان
چکیده
The clustering of related words is crucial for a variety of Natural Language Processing applications. Many known techniques of word clustering use the context of a word to determine its meaning. Words which frequently appear in similar contexts are assumed to have similar meanings. Word clustering usually applies the weighting of contexts, based on some measure of their importance. One of the most popular measures is Pointwise Mutual Information. It increases the weight of contexts where a word appears regularly but other words do not, and decreases the weight of contexts where many words may appear. Essentially, it is unsupervised feature weighting. We present a method of supervised feature weighting. It identifies contexts shared by pairs of words known to be semantically related or unrelated, and then uses Pointwise Mutual Information to weight these contexts on how well they indicate closely related words. We use Roget’s Thesaurus as a source of training and evaluation data. This work is as a step towards adding new terms to Roget’s Thesaurus automatically, and doing so with high confidence.
منابع مشابه
Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures
Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...
متن کاملPresentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures
Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...
متن کاملOverlap-based feature weighting: The feature extraction of Hyperspectral remote sensing imagery
Hyperspectral sensors provide a large number of spectral bands. This massive and complex data structure of hyperspectral images presents a challenge to traditional data processing techniques. Therefore, reducing the dimensionality of hyperspectral images without losing important information is a very important issue for the remote sensing community. We propose to use overlap-based feature weigh...
متن کاملSupervised Distributional Semantic Relatedness
Distributional measures of semantic relatedness determine word similarity based on how frequently a pair of words appear in the same contexts. A typical method is to construct a word-context matrix, then re-weight it using some measure of association, and finally take the vector distance as a measure of similarity. This has largely been an unsupervised process, but in recent years more work has...
متن کاملجاسازی خط ویژگی وزندار برای استخراج ویژگی تصاویر ابرطیفی
One of the most preprocessing steps before the classification of hyperspectral images is supervised feature extraction. Because obtaining the training samples is hard and time consuming, the number of available training samples is limited. We propose a supervised feature extraction method in this paper that is efficient in small sample size situation. The proposed method, which is called weight...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011